To stem or lemmatize a highly inflectional language in a probabilistic IR environment?

نویسندگان

  • Kimmo Kettunen
  • Tuomas Kunttu
  • Kalervo Järvelin
چکیده

Effects of three different morphological methods-lemmatization, stemming and inflectional stem generation-for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four point relevance scale which is partitioned differently in different test settings. Results show that inflectional stem generation which has not been used much in IR, compares well with lemmatization in a best-match IR environment. Differences in performance between inflectional stem generation and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used – a Porter stemmer implementation – is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of 2 compound splitting and derivational expansion of queries are tested.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing a NooJ Module for Turkish Inflectional Analysis: an Example of Highly Productive Morphology

Turkish is a highly inflectional language that represents an interesting challenge to traditional corpus processing techniques. We present here the design of a basic module that allows NooJ users to lemmatize and perform morphological analysis on Turkish texts.

متن کامل

Word Form Text Database Indexes Developing an Automatic Linguistic Truncation Operator for Best-match Retrieval of Finnish in Inflected Developing an Automatic Linguistic Truncation Operator for Best-match Retrieval of Finnish in Inflected Word Form Text Database Indexes 465

The paper presents a new method for handling of morphological variation of query terms in best-match IR. The method is based on enhanced inflectional stems. Use of inflectional stems has earlier been shown to be a good retrieval method in inflected indexes in a best-match environment for a highly inflected and compound-rich language, Finnish. In this paper the earlier stem method is elaborated ...

متن کامل

The Inflectional “-y” at the End of of Imperative Verb in Middle (Dari) Persian

In Early Modern Persian prose and verse, verbs with a present stem accompanied by grapheme "-y" have been used to express modal concepts of imperative and command or invocation and request. Researchers, regardless of the historical changes of Early Modern Persian, believe that this structure of  subjunctive 2nd person singular has been used to express imperative mood, and that &q...

متن کامل

Developing an automatic linguistic truncation operator for best-match retrieval of Finnish in inflected word form text database indexes

The paper presents a new method for handling of morphological variation of query terms in best-match IR. The method is based on enhanced inflectional stems. Use of inflectional stems has earlier been shown to be a good retrieval method in inflected indexes in a best-match environment for a highly inflected and compound-rich language, Finnish. In this paper the earlier stem method is elaborated ...

متن کامل

A Study of Inflectional Categories of Noun in Sistani Dialect

The present article aims to provide a synchronic study of the inflectional or morpho-syntactic categories of noun in Sistani dialect. These categories comprise person, number, gender or noun class, definiteness, case, and possession. Linguistic data was collected via recording free speech, and interviewing with 30 (15 females, 15 males) illiterate Sistani language consultants of age 40–102 year...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Documentation

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2005